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Alternative formulas for synthetic dual 
system estimation in the 2000 census 

Lawrence Brown* ^ and Zhanyun Zhao^^ 

University of Pennsylvania and Mathematica Policy Research 

Abstract: The U.S. Census Bureau provides an estimate of the true popula- 
tion as a supplement to the basic census numbers. This estimate is constructed 
from data in a post-censal survey. The overall procedure is referred to as dual 
system estimation. Dual system estimation is designed to produce revised es- 
timates at all levels of geography, via a synthetic estimation procedure. 

We design three alternative formulas for dual system estimation and in- 
vestigate the differences in area estimates produced as a result of using those 
formulas. The primary target of this exercise is to better understand the nature 
of the homogeneity assumptions involved in dual system estimation and their 
consequences when used for the enumeration data that occurs in an actual 
large scale application like the Census. (Assumptions of this nature are some- 
times collectively referred to as the "synthetic assumption" for dual system 
estimation.) 

The specific focus of our study is the treatment of the category of census 
counts referred to as imputations in dual system estimation. Our results show 
the degree to which varying treatment of these imputation counts can result 
in differences in population estimates for local areas such as states or counties. 

1. Introduction 

The U.S. census is required by tiie Constitution to be conducted every ten years. 
In an attempt to provide better estimates of the true population than contained 
in the basic census counts, the Census Bureau [l?>] uses both statistical and demo- 
graphic methods. In 2000 the statistical process was called Accuracy and Coverage 
Evaluation (A.C.E.). 

The 2000 A.C.E. data consists of two parts: the Population sample (P-sample) 
and the Enumeration sample (E-sample). The P-sample includes persons who are 
validly included in the A.C.E. survey, and the E-sample includes census enumera- 
tions from households in the A.C.E. block clusters. For a detailed overview of the 
2000 A.C.E., please see Hogan [9] and Norwood and Citro [11]. 

The 2000 A.C.E. was designed to get an estimate of the population at every 
geographic level, based on the census count and the information from the E-sample 
and the P-sample. To be more precise, the procedure adopted by the Census Bureau 
is termed a synthetic dual system estimate. Its validity rests on several assumptions, 
including a major synthetic (homogeneity) assumption. 
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Various technical assumptions can be made for synthetic assumption. These 
affect the details of the formulas needed to produce the final population estimates. 
For ideal and homogeneous populations any of the resulting formulas will produce 
unbiased estimates. However, the U.S. population does not appear to have this type 
of ideal structure. Hence different synthetic assumptions yield different estimates, 
and it docs not appear that all of these estimates are actually unbiased. 

This paper investigates the nature of these assumptions and the extent of the 
differences produced when using three alternative dual system formulas within the 
2000 U.S. Census. It should be emphasized that the data available to us do not 
allow us to make any confident claim as to which of the estimates is more accurate; 
indeed such a claim is not our objective. Instead, we present our analyses as a 
means of providing better understanding of the dual system estimation process in 
the presence of actual populations, such as that encountered in the 2000 Census, 
and of judging the extent of differences that may be expected to result from differing 
assumptions about the census enumeration process. 

Our analysis revolves around the extent and homogeneity of imputations of 
household and whole person records into the census enumeration. The available 
data allows us to produce alternative estimates based on different treatment of 
these imputations. As we later remark, there are other aspects of the dual system 
process that might involve analogous biases in the presence of inhomogcneity, how- 
ever the data available to us do not allow for as complete an analysis relative to 
those factors. 

In Section 2 we briefly discuss the nature and extent of imputation in the 2000 
census. It is clear that the desired stochastic homogeneity does not hold there. 
Section 3 introduces background for dual system estimation and the synthetic as- 
sumption. The alternative formulas are presented in Section 4. Section 5 displays 
the results of using these formulas to estimate the true population shares of the 
states in 2000. Section 6 presents similar results for estimation of population shares 
of groups of counties. Mathematical comparison of different formulas is made in 
Section 7. Section 8 contains a summary conclusion and remarks. 

The data for A.C.E. was collected during the 2000 census and first prepared and 
analyzed before April 2001. The Census Bureau decided not to issue the results 
then produced as official census estimates. Following this, the data was re-analyzed 
several times, leading up to revised A.C.E. estimates, referred to as A.C.E. Revi- 
sion II. These were released on March 2003. The revised data identified, and deleted 
from the estimation process, a significant number of records that were judged to 
be duplicates. There were also a number of other more technical, but not insignif- 
icant, innovations in A.C.E. Revision II. See Kostanich [10] for a more complete 
description of A.C.E. Revision II. 

The analyses of our paper are based on the original April 2001 A.C.E. data. 
There are several reasons for our using this original data, rather than the revised 
A.C.E. II data. The primary reason is that this is the data that was supplied to 
us by the Bureau, beginning in 2001. (We gratefully acknowledge the Bureau's 
assistance in supplying us with suitable versions of this data.) Furthermore, our 
purpose has been to understand the nature of traditional dual system estimation, 
and the consequences of alternate synthetic assumptions. For the most part the 
nature of the April 2001 A.C.E. data in relation to the census is analogous to that 
between earlier censuses and their dual system surveys. (In particular, both the 2000 
census counts and the 2001 A.C.E. data contain correspondingly significant numbers 
of duplicates, such as presumably existed in earlier census data even though there 
was no way to explicitly identify them. Sec Section 2 on imputation for discussion 
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of one diflFerence between 2000 and earlier censuses.) Furthermore the analysis of 
A.C.E.II involves a number of special complications and assumptions beyond those 
of the standard dual system analyses. 

2. Imputation 

Wc use //, the Census Bureau's notation, to denote the number of imputations. 
Technically // is referred to as "insufficient information" . It is not unusual for some 
census records to contain incomplete information to a modest extent. If all or nearly 
all relevant information is missing so that the matching of the P-sample records to 
the E-sample enumerations is not feasible, then the record is described as having 
insufficient information. Here we use the word "imputation" generally to describe 
records that for some reason do not include enough information to be included in 
the A.C.E. process. Broadly speaking, census imputation also includes imputation 
for item non-response for records in the A.C.E. , and imputation for matching status 
in the A.C.E. process. Yet in our context, imputation is referred to as the whole 
records not included in the A.C.E. process due to insufficient information. In the 
2000 census, imputations included two parts: inherent imputation and late adds. 

One can identify two basic kinds of inherent imputation. Sometimes we do know 
with reasonable certainty how many people there are in the household, but lack 
personal information about them as is needed for the matching of the E-sample 
and the P-samplc in the dual system process. In this case, we just need to impute 
demographic information for each person. On the other hand, sometimes the actual 
number of people in the household is also unknown. In this circumstance both the 
true counts and personal information need to be imputed. It is even possible to give 
a finer subdivision of types of inherent imputations. See Norwood and Citro [11]. 

Imputation related to a large number of latc-adds was a special feature of the 
2000 census. Because of its concern about address duplication, the Census Bureau 
created a special research program just after the basic census data was collected. 
The Bureau was able to identify, and pulled out, approximately 6 million person 
records in 2.4 million housing units as potential duplicates. Later on, approximately 
2.4 million persons in 1 million housing units were reinstated into the census. How- 
ever, this was too late for the 2.4 million people to be included in the A.C.E. 
process. Hence they were referred to as "Late Adds" and were treated similarly to 
imputation data. For details of research on duplicates, see ESCAP [4] . Table 1 is a 
comparison of the distributions of imputation in 1990 and 2000. 

Besides the fact that there was no special treatment for Late Adds in the 1990 
census, there is a significant difference in terms of the ratio of imputations from 
households with known person count and imputations from households with un- 
known person count between the 1990 and the 2000 Census. In 2000, that ratio was 
about 4. Yet in 1990, the ratio was 44 which is 10 times larger than that in 2000. 



Table 1 

Number of imputations (II) as a percentage of census count (C) 



Imputation type 


2000 Census 


1990 Census 


Known Person Count 


1.68 


0.88 


Unknown Person Count 


0.43 


0.02 


Late Adds 


0.85 


0.00 


Total 


2.96 


0.90 



(Source: The 2000 Census: Interim Assessment) 
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The percentages of // from the 1980 census were more similar to those of 2000 than 
were the 1990 percentages. 

In this paper, the item C —II denotes the number of people with full information. 
They are frequently referred to as "data-defined" persons, and we use DD to denote 
the number of them in the following sections. 



3. Dual system estimation 



As we introduced before, the 2000 A.C.E. data consists of the E-sample and the P- 
sample. Based on the information of the E-sample and the P-sample, a dual system 
estimate of the population is produced for special subgroups, called post-strata. 
These post-strata estimates are then apportioned and recombined so as to form 
estimates for any geographic area, such as state, county, census block etc. We now 
discuss some aspects of this procedure. 



3.1. Post- stratification 

For the purpose of analysis, the population is divided into certain groups called post- 
strata. Sixty-four post-stratum groups were created based on information about 
geographic location, race, Hispanic origin, housing tenure etc. In addition there 
were 7 age/sex categories. Thus originally there were 448 post-strata. Later on, 
some small post-strata were collapsed together to form 416 final post-strata. [See 
Table 5 in the Appendix for details of the construction of post-strata.] 

3.2. Dual system estimation 

The dual system estimate for post-stratum i can be written as 
(1) DSE^ = DD, X CR^ X ^ 



MR, 



Here DDi is the number of data-defined persons in post-stratum i. CRi and 
MRi are the estimates of the E-sample correct enumeration rate and the P-sample 
matching rate respectively. 

In the E-sample, enumerations are divided into two categories: correct enumer- 
ations and erroneous enumerations. The correct enumeration rate measures the 
accuracy of the census. It is estimated as 

= CR,+EE,^ 

where CEi denotes the number of correct enumerations and EEi denotes the num- 
ber of erroneous enumerations in post-stratum i. 

The P-sample persons are taken into a matching procedure to see whether they 
can be matched with persons in the E-sample. The P-sample matching rate then 
measures the coverage of the census. The formula for MRi is more complicated 
than that for the other elements of (1), and it is not particularly pertinent to the 
current considerations. The reader should consult Hogan [8] for details. 

Since it was adopted by the Census Bureau to estimate the population, the dual 
system estimation method has been considered in principle a large-scale capture- 
recapture procedure. It can be motivated from an over-simplified, primitive model 



94 



L. Brown and Z. Zhao 



for capture-recapture estimation. In this model, the interrelation of the P-sample 
and the E-sample can be schematically summarized in a two by two table, and 
elements in the two by two table are estimated based on the assumption of the 
independence of the E-sample and the P-sample. For a detailed overview of dual 
system estimation, see Hogan [7]. 



3.3. Synthetic assumption 

The census provides population figures for geographic subdivisions much smaller 
than those defined by post-stratum boundaries. These "smaller areas" include 
states, congressional districts, metropolitan areas, and even divisions as small as 
census tracts and census blocks within tracts. 

In order to get smaller area estimates, the estimates DSEi for each post-stratum 
must be divided up and apportioned to geographic areas lying within that post- 
stratum. This procedure is called synthetic estimation and the assumption(s) that 
support its validity is (are) referred to as the synthetic assumption. 

It seems to us that there are various reasonable forms of synthetic assumptions 
that could be proposed, and these lead in practice to different smaller area popu- 
lation estimates. For now we first present the formula implemented by the Bureau. 
Then we later contrast it with alternative formulas that also seem to us to be 
plausible. 

For the purpose of synthetic estimation, the Census Bureau assumes that the 
estimate, DSEi, should be divided in proportion to the total census counts within 
its post-stratum. Let the index k, k = 1,2, Ki refer to geographic subregions 
within post-stratum i. Let dk denote the total census counts for post-stratum i and 
region k, and let d denote the totals for the post-stratum. The Bureau population 
estimate for post-stratum i region k is then called DSEik or Sik and is given by 
the formula 

(3) S,k = DSE.k - ^DSE,. 

This reflects the Bureau's synthetic assumption that the population distribution 
for smaller areas within a post-stratum is homogeneous with respect to the census 
counts for those areas within that post-stratum. 

Formula (3) is often rephrased in a different but equivalent format. Define the 
Coverage Correction Factor for post-stratum i (CCFi) by 

(4) CCF, = 
Then 

(5) S,k = C,kCCF,. 

There is a different but equivalent way to interpret (3) or (5). The Census Bu- 
reau's estimate can also be written as 

(6) S,k = C\k + {DSE,-Q)x^. 



We will later build upon this interpretation. 
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In summary, for geographic region k this gives the following population estimate: 
(7) ^fe = ^^,fc = ^afcCCF,. 

i i 

Here in (7), Sk is called the synthetic dual system estimate, abbreviated as 
SynDSE. It is clear from its definition that it applies the same adjustment factor 
for people in each post-stratum, and aggregates the adjusted post-stratum level 
population numbers for an estimate of the population of the entire geographic area. 

3.4- Rationale for post- stratification 

The preceding discussion highlights one main rationale and target for post-strat- 
ification. Accuracy of the synthetic estimation formula (3) rests on the assumption 
that the population for the geographic areas within post-strata is distributed in 
proportion to the census count. 

There are at least two other reasons for post-stratification in connection with 
dual system estimation. The logic supporting the dual system estimate requires 
that the matching rate be constant for individuals within post-strata. Violation of 
this will, in general, lead to bias in the dual system estimate (f ) of the post-stratum 
population. Such a situation is referred to as "correlation bias". There are many 
discussions of correlation bias in the literature. For example, Seker and Deming 
[12] had an early discussion on correlation bias. Bell [1] introduced a third system 
to estimate the correlation bias. Freedman and Wachter [5] also had a discussion 
on correlation bias and heterogeneity. Zhao [14] investigated the data of the 2000 
census to test the plausibility of the assumption of absence of correlation bias. 

A third, though perhaps less important, rationale for post-stratification is that, 
in principle, suitably chosen post-strata can reduce the variance of estimates given 
through formulas such as (1) and (3). Conversely, a choice of too many post-strata 
with consequently small sample sizes within each post-stratum can lead to estima- 
tors with inflated variances. See Hogan [7] for a discussion of this in relation to the 
1990 census. See Freedman and Wachter [G] for a perspective on post-stratification 
and its effects in the 2000 census. 

4. Alternative formulas 

In this section, we present three alternative formulas for synthetic estimation. The 
Census Bureau's formula is based on the synthetic assumption that the population 
distribution for small areas within a post-stratum is homogeneous with respect to 
the census counts (including imputations) for those areas within that post-stratum. 
Our alternative formulas are sensitive to the the homogeneity of imputations in the 
census, and its role in the synthetic estimation of subpopulation counts. 

4.1. First alternative formula 

Note that the estimates DSEi are computed only from enumerations of data-defined 
people. That is because Ci does not appear in (1). Thus the estimates of DSEi of 
post-stratum totals involve DD directly, but do not involve the number of counts 
labelled as //. It can thus be plausibly argued that the counts // should also not 
play a role in distributing DSEi geographically within post-strata. 
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As noted in Section 3.4, homogeneity assumptions relative to the components of 
(1) are already part of the general justification for dual system estimation. From 
this perspective, it also seems reasonable to assume that the population for the ge- 
ographic area within post-strata should be proportional to the enumeration of data 
defined people. This form of synthetic assumption leads to the alternate estimate 
Slf. described as the formula 

(8) Sl^, ^ DSE}^ = 

where DDik is the number of data-defined persons in geographic region k within 
post-stratum «, z = 1, 2, . . . , /, = 1, 2, . . . , Ki. 

There is another way to view the formula for Sl^.. For each post-stratum i, 
consider DCFi (Data-defined Coverage Factor) as a replacement of CCFi. Their 
relationship is described in the following formula 

Then applying the same Data-defined Coverage Factor for post-stratum i to the 
number of data-defined persons in geographic region k within post-stratum z, the 
corresponding 5"^?^. for geographic level k is thus written as 

(10) Slk^DD^kDCF,. 

Note that (9) implies that DCFi = , it is easy to show that (8) and (10) 

are equivalent. 



4-2. Second alternative formula 

It can be plausibly argued that the distribution of imputations Ilik = Cik — DDik , 
= 1, 2, . . . , A'i is a valid reflection of distribution of the true undercount relative 
to Cik within the post-stratum. Presumably imputations are concentrated in areas 
where it is intrinsically hard to count people, and hence areas with high undercount 
rate would be expected to have high imputation rate. Since the "true" undercount 
is not observed, it is hard, or impossible to devise a way to check this assertion. If 
it were valid, then the desirable estimate for the true population would be derived 
by distributing the post-stratum undercount estimates within the post-stratum in 
proportion to Ilik- This leads to the formula 



(11) 



Sfk = C,k + {DSE,~C,)x^-^ 

= C,k + {DD,xDCF,-C,)x^. 



As we noted before, the estimate of the total undercount for post-stratum i is 
DSEi — Ci, and this undercount is distributed to each geographic level proportion- 
ally to its imputation rate within the post-stratum. The estimate for the population 
is then the census counts plus the estimated undercount. In summary, this formula 

is the same as (6) except that — — is substituted for -— -. 

^ H Ci 
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4-3. Third alternative formula 

Note that the Census Bureau's formula (6) is Sk = Cik + {DSEi ~ Ci) x — 
Compare this with (11), and another reasonable formula comes out naturally as 

DD,k 



DD, 



(12) ^ DD, 

In words, this formula begins from a base of the census counts Cik (including 
Ilik ) ■ It then considers the distribution of DDik as a reflection of the true under- 
count rate at geographic level within post-strata. 

Clearly all of the formulas presented here have the same normalization property 

(13) ^'^k = E = ^ = 1' 2' 3- 

k k 

Also, if we take the summation over post-stratum index i, then we will have the 
estimate of the population at geographic area k as 

(14) Si^Y.^'ik. 1,2,3. 



5. Results from alternative formulas at state level 



5. 1 . Comparison of shares at state level 



Allocating seats in the House of Representatives is the original constitutional man- 
date for which the decennial census was established. Much attention was put on 
which states had gained or lost seats. It is of primary interest to compare different 
formulas at the state level. 

Figure 1 shows comparison of alternative formulas and the Census Bureau's for- 
mula for the 16 largest states. [See Figure 5 in the Appendix for the full comparison 
of all 51 states.] The comparison is made in the sense of population shares. A state's 
population share is normally defined as its percentage of the national total. Thus 
they do not affect estimates for national totals. The horizontal line for each state 
shows the confidence interval of share difference: SynDSE {Sk) share minus census 
share. The standard error of share difference is computed from Davis [3] published 
by the Census Bureau. The square represents the share difference between Sk and 
census, the dot represents the share difference between 5*^ and census, and the tri- 
angle represents the share difference between Sf, and census. The share difference 
between S^ and census is omitted from the figure since it is very close to the one 
between Sk and census. 

The most prominent feature is for the state of New York where the difference 
calculated from S}. falls very far outside of (below) the confidence interval calculated 
from census formula. For several other states the result for S}. is also outside the 
confidence interval (above, as for North Carolina, Virginia, and Ohio, or below, 
as for Indiana and Illinois). S^ agrees better with the census formula. For several 
large states, such as Texas, California, Florida and Pennsylvania, the square and 
the triangle are very close to each other. The result for New York is driven towards 
0, although it still falls outside (above) the confidence interval. 
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Texas 
California 
Georgia 
North Carolina 
Wasliington 
Virginia 
Florida 
Tennessee 
New Jersey 
New Yorl< • 
Indiana 
Massachusetts 

Illinois 
Pennsylvania 
Michigan 
Ohio 



CB's formula 
Alter, formula 1 
Alter, formula 2 



-0.1 -0.05 0.05 0.1 



Fig 1. State level shares comparison from different formulas. 



Interestingly, most of the time, the share difference of Sk and census falls between 
the difference of S]. and census, and the difference of Sf. and census. This tells us 
that, in a sense the census formula is a compromise of the two alternatives we 
introduced. 



5.2. Role of imputation 

Imputations create the primary difference in practice between the Bureau's syn- 
thetic formula (3) and alternative formulas such as our (8), (11) and (12). Note 
that the assumption justifying (8) is that the undercount is homogeneous with re- 
spect to DDik for regions within post-strata. In contrast, the assumption justifying 
(3) is that of homogeneity with respect to Cik = DDik + Hik- If the imputation 
rates were stochastically homogeneous with respect to Cik, then both formulas 
would have the same expectation, and would generally yield very similar results in 
practice. 

Imputation rates for the 16 large states of Figure 1, together with the population 
shares from the census, are given in Table 2. In this table, the total imputation rates, 
the imputation rates from late adds (LA) and non late adds (Non-LA), as well as 
the census shares are listed. [See Table 6 in the Appendix for the full table for all 
51 states.] The overall imputation rate for New York is considerably larger than 
the national rate of 3%. 

Furthermore, what really matters is the imputation rates within post-strata 
within the state relative to those post-strata results elsewhere. Because of this 
it seems informative to supplement the overall imputation rates given in the table 
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Table 2 

Imputation rates for the 16 states 



State 


II(Tot) 


II(Non-LA) 


II(LA) 


Census Share 


Number of 
post-strata 


Mean II (Tot) of 
post-strata 


NY 


4.913 


3.201 


1.712 


6.724 


256 


5.511 


TX 


3.508 


2.633 


0.875 


7.417 


264 


3.962 


IL 


3.383 


2.469 


0.914 


4.422 


246 


4.380 


GA 


3.300 


2.349 


0.951 


2.907 


234 


3.994 


CA 


3.255 


2.720 


0.535 


12.08 


260 


4.360 


NJ 


2.869 


2.008 


0.861 


3.004 


194 


4.849 


NC 


2.795 


1.640 


1.156 


2.849 


236 


3.558 


IN 


2.700 


2.202 


0.498 


2.157 


246 


4.106 


FL 


2.672 


2.113 


0.558 


5.700 


236 


4.534 


MA 


2.468 


1.558 


0.909 


2.240 


253 


3.239 


TN 


2.465 


1.599 


0.867 


2.025 


222 


3.372 


WA 


2.407 


1.894 


0.513 


2.105 


236 


3.410 


PA 


2.322 


1.574 


0.748 


4.331 


242 


3.595 


VA 


2.283 


1.555 


0.727 


2.503 


260 


3.536 


MI 


1.876 


1.341 


0.536 


3.541 


260 


3.039 


OH 


1.680 


1.123 


0.557 


4.040 


222 


2.699 



with per post-strata averages. As a result, Table 2 also gives the mean imputation 
rate per post-strata within state as computed from the following formula: 

(15) MIRk = \ V ^xlOO% 

where nj^ is the number of post-strata within the state with non-zero census counts, 
which is also listed in the table. Even a cursory examination of these imputation 
rates in the census reveals that an assumption for the imputations of stochastic ho- 
mogeneity within post-strata is not reasonable. (A valid, formal test of this statis- 
tical hypothesis can be derived using the methods of Zhao [14]. This test decisively 
rejects the null hypothesis of stochastic homogeneity, with a p-value < 0.0001.) 

In Table 2, the comparison of New York and New Jersey points to an interesting 
phenomenon. Overall New Jersey has an imputation rate of 2.869%. This is fairly 
close to the national average. But it shares a lot of post-strata with New York. The 
mean value of the imputation rates per post-strata in New Jersey is 4.849%. This 
is the second highest among the 16 states. Yet as shown in Figure 1, in contrast to 
New York, the differences for New Jersey using and S^. are quite close to that 
using the Census Bureau's Sk- The result is that although New Jersey has relatively 
high mean imputation rate per post-strata, its population estimate is not increased 
as much by the dual system as this might seem to warrant. One explanation for 
this is that an important neighboring state (New York) has even higher imputation 
rates. 

From another point of view, we can consider our alternative formula one as a 
basic rate for estimate of population, while the Census Bureau's formula can be 
viewed as an attempt to use imputations with the hope of improving these basic 
estimates. 

6. Results from alternative formulas at county-group level 

To better investigate the differences among all the formulas, we conduct a further 
analysis down to a finer level: county-group level. Ideally our analysis might have 
been performed on the level of congressional districts. However we had only county 
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level data to work with. Hence we created county groups to roughly approximate the 
size and geographic contiguity of congressional districts. (In some cases our county 
groups were much more populous than congressional districts since we could not 
split counties into smaller districts.) In general, small adjacent counties are lumped 
to form a group with population roughly like a congressional district, while rela- 
tively large counties (for example, a county contains several congressional districts) 
would make a county-group by themselves. Totally we created 369 county-groups, 
on average each having 730,000 people. 

For each county-group, an adjusted estimate {SynDSE) is constructed by the 
Census Bureau's formula and our alternative formula 1, 2 and 3. It seems most 
suitable to compare the adjustments to the relative shares. This is consistent with 
the discussion in Brown et al. [2] and Freedman and Wachter [5]. However we 
found direct statements of share differences to be less suitable in part because of 
unfamiliarity with the county-groups and variability in their sizes. Hence it seems 
more informative to express the adjustments in percentage terms from a base of 
the original census numbers. It can be easily shown that this measure is a linear 
transformation of the share difference, and as noted in the above references, the 
results from the percent adjustment would be consistently comparable to the share 
difference. 

There are two possible choices of the base of the original census numbers. Nat- 
urally people would consider the census counts, and the relative percent difference 
can be expressed as 

(16) reld^r^^y^^^^^xlOO%. 

However, one of the implications of the alternative formula one is that the number 
of data-defined person DD is a more basic quantity. Therefore we use DD as the 
base, and the relative percent difference is then defined as 

. , SynDSE - DD 

(17) reldif = — — X 100%. 

To account for the implication of imputation, (17) can be modified to be a 
measure called state adjusted difference (SAD), which is defined by 

In (18), j is the county-group index, s is the state index. The following Table 3 
illustrates the descriptive statistics for SAD using different formulas. 

As we already found from the last section, the alternative formula three gives 
very similar results as the Census Bureau's. It is also noticeable from the table that 
overall there is no substantial difference in terms of the mean value of differences. 
[The results from reldif^ can be found in Table 7 in the Appendix, and they will 
give similar relative conclusions among county groups within a state.] 



Table 3 

Distribution of state adjusted difference at county group level 



Min Max Median Mean SD 

CB's formula -2.97 7l38 098 Lli Oo 

Alter, formula 1 -2.95 4.93 1.20 1.18 1.10 

Alter, formula 2 -3.08 9.80 0.81 1.15 1.60 

Alter, formula 3 -2.98 7.29 0.99 1.14 1.39 
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5 10 15 

County groups 



Fig 2. State adjusted difference - New York (DD base). 

It is impossible to visually show the results of SAD from all county-groups in 
one figure; instead we illustrate the results in the following three states: 

1. New York: because of the large discrepancy in share comparison (Figure 1) 
and the relatively large size (3rd biggest state) 

2. New Jersey: because of the interesting phenomenon discussed in Section 5.2 

3. California: because of the relatively large size (biggest state) 

Figure 2 is the plot of SAD in each county-group in New York. [The table 
generating this figure can be found in the Appendix.] Each one of the 21 points on 
the X-axis represents a county group, and the state adjusted differences represented 
on the Y-axis are connected by a line. Different types of lines represent different 
formulas. Again, the results from alternative formula 3 are not shown in the figure 
because they are very close to those from the Census Bureau's formula. It is obvious 
that for the three counties in New York city (Bronx, Kings and Queens) which have 
a very large percent of imputation, the differences are much higher than those from 
other county-groups. 

Figure 3 is the plot of SAD in each county-group in New Jersey. Despite the 
fact that New Jersey shares a lot of post-strata with New York, the scale of the 
differences is much smaller than that from New York. 

Figure 4 is the plot of SAD in each county-group in California. From all three 
figures, it can be seen that most of the time, the lines using Census Bureau's formula 
lie between the lines using our alternative formula 1 and alternative formula 2. 
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2 4 6 8 10 12 

County groups 

Fig 3. State adjusted difference - New Jersey (DD base). 



This confirms that the Census Bureau's formula is kind of a compromise of the two 
alternatives. 

It can also be seen that in general, at the lower end of the figure (smaller dif- 
ference between SynDSE and DD), the difference using Census Bureau's formula 
tends to be lower (higher) than that using alternative formula 1 (using alternative 
formula 2), while at the upper end of the figure (larger difference between SynDSE 
and DD), the difference using Census Bureau's formula tends to be higher (lower) 
than that using alternative formula 1 (using alternative formula 2). (The detailed 
results at each county group in these three states could be found in Table 8 through 
Table 10 in the Appendix.) 



7. Comparison of different formulas 
7. 1 . Comparison of four formulas 

As stated earlier, if the imputation rates were stochastically homogeneous with 
respect to the census count, then all the formulas would have the same expectation. 

It is easy to prove that if — = — ^, then Si ^ Si = S^ = Sk- 
ill L-i 
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Fig 4. State adjusted difference - California (DD base). 



7.2. When is DCF better 

Our alternative formula (10) uses DCF instead of CCF. One may wonder under 
which conditions does DCF behave better than CCF. 

Consider the following simpler case: there are two states for a single post- 
stratum, and there are no people who moved between the census day and the 
A.C.E. interview. The corresponding counts in state 1 and 2 within post-stratum 
are: CEi,CE2, EEi, EE2, M Ni, M N2, NNi, NN2, Ih, Ih, and they are all ob- 
servable. Here CEj, EEj, MNj, NNj, and Ilj {j = 1,2) denotes the number 
of correct enumerations, erroneous enumerations, matched non-movers, unmatched 
non-movers, and imputations respectively. For a formal definition of these types 
of counts, see Norwood and Citro [11]. As also shown in Norwood and Citro [ 1 1], 
CCF and DCF can be written as functions of these five types of counts 

(19) CCF . ^^1±^ X + 



CEi + CE2 + EEi + EE2 + III + Ih MNi + MN2 ' 



(20) DCF = ^-^1±^ X + 



CEi + CE2 + EEi + EE2 MNi + MN2 

To further simplify the case, we assume that the two states are equal in size, i.e. 
CEi = CE2,MNi = MN2, NNi = NN2 
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The following analysis makes a comparison of the squared errors resulting from 
use of (3) and (10). In order to make this comparison it is necessary to make some 
assumptions about the true population. The analysis is somewhat simple under the 
plausible assumption that the unbiased DSE from the two by two tables within 
each state describes the true population parameters. A similar analysis is possible 
under other assumptions. 

The unbiased DSE from the actual two by two tables within each state can be 
written as 

The synthetic DSEs for state 1 and 2 within post-stratum calculated from CCF 
and DCF (use alternative formula one) are 

^'-CE,+CE2+EE,+EE2+Ih+Il2 ' 

and 

Define the variance, i.e. the squared error of synthetic DSE from the true popu- 
lation, as 

A, = {St -Sl)^ + {3^-3^2? 
= 2^2 ( 



2,CEi+ EEi + Ih - {CE2 + EE2 + Ih) ^2 



CEi + EEi + Ih + CE2 + EE2 + II2 
A, = {3i ~ 3lf + {3i - 3lf 

CE^+EE,-{CE2+EE2) 2 
^ CEi + EEi + CE2 + EE2 ' ' 

The difference of Aw and A„ is 



(23) 



A, -A, = 252{( 1^E,-EE2 2 

^^2CEi+ EEi+ EE2' 



23^{{ 



EEi+Ih-iEE2+Il2) 
'^2CEi + EEi + EE2 + Ih +II2 

2 r ^ EEi — EE2 EEi -\- III ~^ EE2 — II2 



2CEi + EEi + EE2 2CEi + EEi + EE2 + Ih + Ih ' 
EEi - EE2 EEi + Ih - EE2 - Ih 



' 2CEi + EEi + EE2 2CEi + EEi + EE2 + Ih + Ih 
If CE » {EE, II), as is usually the case, then 



)}■ 



(24) 

-23^{ACEi{EEi - EE2) + 2CEi{Ih - Il2)){2CEi{Ih - Ih)) 



Arf - A, 



{2CEi + EEi + EE2)'^{2CEi + EEi + EE2 + Ih + Ih)^ 
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Table 4 

Frequency table of better performance of DCF among large/small post-strata 





CCF 


DCF 


Total 


Small 


48 


89 


137 


Large 


23 


84 


107 


Total 


71 


173 


244 



From (24) we have 

• If EEi = EE2 then - < 0, DCF is better. 

• If Ih = II2 then Ad - Ac > 0, CCF is better. 

• If EEi ^ EE2 and Ih 7^ Ih- 

- If EEi > EE2 and Ih > Ih then A^ - A^ < 0, DCF is better. 

- If EEi > EE2 and Ih < Ih- 

* If EEi - EE2 < - ^-^^-^^ then A^ - A^ < 0, DCF is better. 

* If EEi - EE2 > - ^-^^-^^ then A^ - A^ > 0, CCF is better. 

More generaUy, we assume CE2 = XCEi,MN2 = \MNi,NN2 = AiViVi, since 
homogeneity assumption appears to hold for the two largest groups: CE and MN. 
For the setup and results from the test of homogeneity assumption, see Zhao [14]. 
Similarly we have 

• If XEEi = EE2 then A^ - A^ < 0, DCF is better. 

• If Xlh = Ih then A^ - A^ > 0, CCF is better. 

• If XEEi ^ EE2 and Xlh ^ Ih- 

- If XEEi > EE2 and Xlh > Ih then A^ - Ac < 0, DCF is better. 

- If XEEi > EE2 and Xlh < Ih- 

* If XEEi - £'£'2 < _hllL^IIl then A^ - A^. < 0, DCF is better. 

* If XEEi - EE2 > then A^ - A^ > 0, CCF is better. 

The above discussion gives certain conditions when the Census Bureau's cor- 
rection factor (4) or the alternative correction factor (9) performs better than the 
other one. To show the empirical results from the data, let's consider a simple case. 
Suppose we regard New York state as state 1, and all the other states together 
as state 2, then we calculate the DCF and CCF for the 244 post-strata that are 
in both states. We found that DCF is better in 70% of post-strata which exist 
in both state 1 and state 2. Furthermore, if we categorize the post-strata into two 
groups: large post-strata (having more than 50,000 correct enumerations) and small 
post-strata, DCF performs much better in the large post-strata. 

From Table 4, it could be seen that DCF (corresponding to formula (10)) per- 
forms better about 65% of the time in small post-strata and 80% of time in large 
post-strata. 

8. Conclusion 

The major purpose of this paper is to better understand the 2000 A.C.E. process by 
providing alternative formulas. To construct these three formulas, alternate forms 
of the synthetic assumption are used, and the structure of imputation is analyzed. 
We find that the alternative estimation formulas seem also justifiable. 
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It is perhaps hard to teU which formula gives generally more accurate results. 
It appears to us that each one has its own merit and no one dominates another. 
In addition, there seems no way with existing data to compare the biases of the 
formulas. Nonetheless, it appears that the first of the alternatives would achieve 
smaller variance than that of the Census Bureau's formula if the number of erro- 
neous enumerations and the number of imputations arc positively correlated, which 
holds true in most of the cases. 

What we do observe is that the Census Bureau's formula tends to be a compro- 
mise among the three alternatives. For this reason it seems to us reasonable to stick 
to the original one, especially in view of a lack of further evidence. 

All the Census Bureau's formula and our alternative formulas use the total num- 
ber of imputations to create population estimates. As noted in Section 2, there are 
different classes of imputation. It may be preferable to use only some subsets of 
imputations, and create formulas in different ways. 

Finally we want to point out that the correct enumeration rate CE/ [CE + EE) 
is estimated in producing synthetic estimation. This estimate is another poten- 
tial source of heterogeneity, and the related synthetic assumption on it should be 
studied. A valid, formal test of the hypothesis that the correct enumeration rate is 
geographically homogeneous within post-strata for states or counties can be derived 
using the methods of Zhao [14] . This test shows there is significant non-homogeneity. 
(The details of this test will be reported elsewhere.) It would be desirable to also 
see how this inhomogeneity affects synthetic estimates results. However, unlike //, 
the components CE and EE are not measured for the entire census, but rather 
only for the A.C.E. sample blocks. Thus it is unclear how to use existing data to 
create estimates related to this factor. 

Appendix 



Table 5. Schematic for post- stratification variables (see Section 3.1 for further description) 

(MSA: Metropolitan Statistical Area; TEA: Type of Enumeration Area; MO/MB: Mail out/Mail hack) 



Race/Hispanic Origin 
Domain number 



Tenure 



MSA/TEA 



High return rate 



Low return rate 



NE 


MW 


S 


W 


NE 


MW 


S 


W 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 



Domain 7: 

Non-Hispanic White 
and Other 



Owner 



Non- 
Owner 



Domain 4: 
Non-Hispanic Black 



Ov 



Non- 
Owner 



Domain 5: Native Hawaiian 
or Pacific Islander 
Domain 6: 
Non-Hispanic Asian 
Domain 3: 
Hispanic 



Owner 



Non- 
Owner 



Domain 1: On Reservation 
American Indian or Alaska Native 
Domain 6: Off Reservation 
American Indian or Alaska Native 



Large MSA MO/MB 
Medium MSA MO/MB 
Small MSA & Non-MSA MO/MB 
All Other TEAs 
Large MSA MO/MB 
Medium MSA MO/MB 
Small MSA & Non-MSA MO/MB 
All Other TEAs 
Large MSA MO/MB 
Medium MSA MO/MB 
Small MSA & Non-MSA MO/MB 
All Other TEAs 
Large MSA MO/MB 
Medium MSA MO/MB 
Small MSA & Non-MSA MO/MB 
All Other TEAs 
Owner 

Non-Owner 
Owner 

Non-Owner 
Large MSA MO/MB 
Medium MSA MO/MB 
Small MSA & Non-MSA MO/MB 
All Other TEAs 
Large MSA MO/MB 
Medium MSA MO/MB 
Small MSA & Non-MSA MO/MB 
All Other TEAs 
Owner 

Non-Owner 
Owner 

Non-Owner 



33 
35 
37 
39 
41 

43 

45 

47 



53 
55 
57 
59 



49 
50 
51 
52 



61 
62 
63 
64 



34 
36 
38 
40 
42 

44 

46© 

48 



54 
56 
58 
60 
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Table 6 
Imputation rates for 51 states 



State 


II(Tot) 


T T / T^T ^ « T A \ 

ll(J>Jon-ljAj 


II(LA) 


Census Share 


Number of 
post-strata 


TV T TT/rn j. \ I* 

Mean 11( lot) or 
post-strata 


NY 


4.913 


3.201 


1.712 


6.724 


256 


5.511 


NM 


4.474 


2.895 


1.579 


0.652 


236 


4.137 


HI 


4.247 


2.913 


1.334 


0.430 


222 


4.781 


WY 


3.921 


2.588 


1.333 


0.175 


148 


4.166 


NV 


3.918 


3.257 


0.661 


0.718 


236 


4.479 


AZ 


3.891 


3.145 


0.746 


1.835 


236 


4.942 


VT 


3.887 


2.223 


1.664 


0.215 


134 


3.961 


DC 


3.860 


3.726 


0.134 


0.196 


144 


4.373 


TX 


3.508 


2.633 


0.875 


7.417 


264 


3.962 


AL 


3.491 


2.212 


1.279 


1.584 


235 


4.104 


IL 


3.383 


2.469 


0.914 


4.422 


246 


4.380 


DE 


3.339 


2.901 


0.438 


0.277 


222 


4.900 


RI 


3.302 


2.360 


0.942 


0.369 


221 


4.821 


GA 


3.300 


2.349 


0.951 


2.907 


234 


3.994 


CA 


3.255 


2.720 


0.535 


12.08 


260 


4.360 


SC 


3.221 


2.145 


1.076 


1.417 


236 


4.423 


MD 


3.074 


2.503 


0.572 


1.887 


222 


4.178 


NH 


3.056 


1.987 


1.070 


0.439 


217 


4.440 


MT 


3.039 


1.583 


1.456 


0.321 


152 


4.478 


MS 


3.038 


1.677 


1.360 


1.005 


228 


3.289 


LA 


2.886 


1.886 


1.000 


1.584 


236 


3.432 


NJ 


2.869 


2.008 


0.861 


3.004 


194 


4.849 


AR 


2.810 


1.403 


1.407 


0.950 


222 


2.852 


NC 


2.795 


1.640 


1.156 


2.849 


236 


3.558 


CO 


2.786 


2.039 


0.747 


1.535 


236 


4.119 


IN 


2.700 


2.202 


0.498 


2.157 


246 


4.106 


FL 


2.672 


2.113 


0.558 


5.700 


236 


4.534 


ME 


2.604 


1.258 


1.345 


0.453 


184 


2.889 


AK 


2.584 


1.385 


1.199 


0.202 


152 


3.131 


ID 


2.554 


1.821 


0.733 


0.461 


152 


3.374 


WV 


2.506 


0.856 


1.651 


0.645 


215 


2.625 


MA 


2.468 


1.558 


0.909 


2.240 


253 


3.239 


TN 


2.465 


1.599 


0.867 


2.025 


222 


3.372 


KT 


2.447 


1.164 


1.283 


1.435 


222 


2.888 


WA 


2.407 


1.894 


0.513 


2.105 


236 


3.410 


O i 






U.o4 ( 


1 one; 
i.zuo 


zoo 


vJ.O I 


UT 


2.369 


1.765 


0.604 


0.801 


236 


2.972 


SD 


2.362 


1.392 


0.970 


0.266 


140 


2.733 


PA 


2.322 


1.574 


0.748 


4.331 


242 


3.595 


VA 


2.283 


1.555 


0.727 


2.503 


260 


3.536 


OK 


2.261 


1.282 


0.979 


1.220 


236 


2.517 


OR 


2.260 


1.711 


0.549 


1.222 


236 


3.639 


WI 


2.153 


1.600 


0.553 


1.903 


258 


4.311 


MO 


2.098 


1.200 


0.898 


1.986 


222 


3.050 


ND 


1.985 


1.003 


0.983 


0.226 


138 


2.342 


KS 


1.904 


1.227 


0.678 


0.953 


236 


2.739 


MI 


1.876 


1.341 


0.536 


3.541 


260 


3.039 


MN 


1.873 


1.237 


0.636 


1.748 


236 


3.463 


OH 


1.680 


1.123 


0.557 


4.040 


222 


2.699 


lA 


1.629 


0.963 


0.666 


1.032 


215 


2.589 


NE 


1.608 


0.994 


0.615 


0.607 


236 


2.487 
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Table 7 



Distribution 


of relative difference 


between 


census and SvnDSE at county group level 






Min 


Max 


Median 


Mean 


SD 


CB's formula 


-0.13 


2.96 


1.15 


1.14 


0.54 


Alter, formula 1 


-3.26 


3.78 


1.22 


1.18 


0.84 


Alter, formula 2 


-0.14 


4.31 


1.06 


1.13 


0.68 


Alter, formula 3 


-0.13 


2.97 


1.15 


1.14 


0.54 



Table 8 

County group level results in New Jersey 
(Through Table 8 to 10, the second column "CB's" lists the results using the Census Bureau's 
formula, the third column "Alter. 1" lists the results using alternative formula 1, and the fourth 
column "Alter. 2" lists the results using alternative formula 2.) 



Relative difference in 


New Jersey (census as 


the base) 




Counties 


CB's 


Alter. 1 


Alter. 2 


Census 


Il/Census 


Passaic 


1.566 


1.282 


1.743 


479073 


3.863 


Essex 


1.471 


1.706 


1.423 


770844 


4.462 


Hudson 


1.470 


0.745 


1.763 


599525 


5.369 


Somerset, Union 


1.250 


1.268 


1.339 


807714 


3.087 


Atlantic, Cape May & 












Cumberland, Salem 


1.234 


1.179 


1.354 


542964 


2.766 


Mercer 


1.223 


1.160 


1.514 


329669 


3.030 


Middlesex, Monmouth 


1.123 


1.707 


1.009 


1334607 


2.021 


Morris 


1.064 


1.387 


1.011 


461026 


1.938 


Sussex, Warren 


0.973 


1.920 


0.838 


243450 


1.890 


Bergen 


0.967 


1.344 


0.894 


872769 


2.187 


Burlington, Ocean 


0.964 


1.089 


0.985 


912247 


2.068 


Camden, Gloucester 


0.951 


0.882 


1.274 


747998 


2.756 


Hunterdon 


0.797 


1.380 


0.772 


117643 


1.474 




State adjusted 


difference 


in New Jersey 




Counties 


CB's 


Alter. 1 


Alter. 2 


DD 


II/DD 


Hudson 


4.273 


3.507 


4.583 


567337 


5.674 


Essex 


3.255 


3.502 


3.206 


736452 


4.670 


Passaic 


2.693 


2.398 


2.877 


460565 


4.019 


Somerset, Union 


1.521 


1.540 


1.613 


782780 


3.185 


Mercer 


1.432 


1.367 


1.732 


319680 


3.125 


Atlantic, Cape May & 












Cumberland, Salem 


1.159 


1.103 


1.282 


527948 


2.844 


Camden, Gloucester 


0.858 


0.787 


1.190 


727384 


2.834 


Bergen 


0.271 


0.656 


0.196 


853681 


2.236 


Middlesex, Monmouth 


0.254 


0.851 


0.138 


1307639 


2.062 


Burlington, Ocean 


0.143 


0.270 


0.164 


893380 


2.112 


Morris 


0.108 


0.437 


0.054 


452090 


1.977 


Sussex, Warren 


-0.036 


0.930 


-0.174 


238849 


1.926 


Hunterdon 


-0.649 


-0.057 


-0.674 


115909 


1.496 
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Table 9 

County group level results in New York 



Relative difference in 


New York (census 


as the ha. 


sej 




Counties 


^ I J s 


/iiter. J. 


Alter. 2 


Census 


11/ Census 


Bronx 


2.405 


—1.893 


4.313 


1285410 


9.016 


Clinton, Franklin, Fulton, Hamilton &; 












Jefferson , Lewis Oswego, St Lawrence 


1.683 


1.512 


1 Q 1 n 
i.oiy 


Oz4 i oO 


Q 1 Oft 


Chenango, Delaware , Herkimer & 












Madison, Oneida, Otsego, Scholiarie 


1.486 


1.364 


1.568 


530»2d 


3.023 


Broome, Sullivan, Tioga, Tompkins, Ulster 


1.480 


1.193 


1.576 


5d283d 


3.147 


INew York 


1.4 m 


O.VlZ 


1.088 


1477358 


2.887 


Allegany, Cattaraugus, Chautauqua & 












Chemung, Schuyler, Steuben, Yates 


1.280 


1.036 


1.522 


AQ A AQCl 

4o44oy 


2.786 


Dutchess, Putnam 


1.263 


0.891 


1.506 


355568 


3.702 


Kings 




— O.ZDvS 


1.717 


2426027 


9.817 


Orange, Rockland 


1 212 


n 7F,A 


1.430 


bub / /y 


3.851 


Columbia, Essex, Greene, Rensselaer &; 












Saratoga, Warren, Washington 


1.095 


U.OZU 


1.326 


603542 


3.339 


Westchester 


0.999 


1.162 


1 1 /I o 


oyyoUD 


v5. iOU 


Albany, Montgomery, Schenectady 


0.999 


1 f\f\7 
1 .UD / 


u.oyu 


^uyoyy 


2 602 


Queens 


0.945 


9 '^^Q 
— z.ouy 


1 /I 1 v 


zzuzouo 


1 .yb ( 


Cayuga, Cortland, Onondaga 


0.794 


u.yoD 


U.oU / 


OD / 4 / i 


Z. io4 


Nassau 


0.736 


— U.U 1 z 




ivjiZOOD 


v5.oc5U 


Monroe 


0.735 


-L.OOU 


U.04i 


1 Uooo4 


i .OIZ 


Erie 


0.684 


u.uo / 


n OQO 


01 QAVA 

y iy4 1 4 


Z.DC50 


Niagara, Orleans 


0.532 


U.Oi ± 


U.zy i 


zOboio 


i.yo4 


Suffolk 


0.491 


U. ±D± 


u.y4 ( 


ioyU / yi 


Q 1 flQ 

o.Wo 


Genesee, Livingston, Ontario & 












Seneca, Wayne, Wyoming 


0.252 


n 97'^ 

U. Z 1 o 


n Qvo 
U.o < z 


o ( boyy 


i.oyy 


Richmond 


-0.035 


1 f\9(\ 
1 .uzu 


u.o ( 


^O^04Z 


O.OOZ 


State adjusted difference 


tn New 


York 






Counties 


CB's 


/iixer. J. 


Alter. 2 


UiJ 


TT /T~\T~* 
11/ iJU 


Bronx 


7.386 


2.662 


9.482 


1169523 


9.909 


Kings 


7.110 


2.100 


7.622 


2187875 


10.885 


Queens 


4.517 


n Q97 


5.030 


2027022 


8.657 


Richmond 


0.428 






/111 QVO 
4ilo 1 Z 


O.DoZ 


Orange, Rockland 


0.098 


—0.367 


0.325 


ooo412 


4.005 


Dutchess, Putnam 


-0.011 


— u.oy / 


0.241 


342405 


3.844 


Clinton, Franklin, Fulton, Hamilton & 












Jefferson, Lewis Oswego, St Lawrence 


-0.203 


— U.o ( y 


—0.062 


508331 


3.227 


Broome, Sullivan, Tioga, Tompkins, Ulster 


-0.390 


— U.OOD 


—0.291 


040iZD 


3.249 


Chenango, Delaware, Herkimer & 












Madison, Oneida, Otsego, Schoharie 


-0.517 


— 0.643 


—0.433 


514779 


3.117 


Columbia, Essex, Greene, Rensselaer & 












Saratoga Warren, Washington 


-0.580 


-1.175 


—0.340 


oooooo 


3.455 


New York 


-0.673 


0.908 


-1.073 


1434701 


2.973 


Westchester 


-0.882 


-0.714 


-0.728 


871460 


3.253 


Nassau 


-0.906 


-1.742 


-0.443 


1268496 


3.499 


Allegany, Cattaraugus, Chautauqua & 












Chemung, Schuyler, Steuben, Yates 


-0.985 


-1.236 


-0.736 


470993 


2.865 


Suffolk 


-1.457 


-1.798 


-0.987 


1347631 


3.203 


Albany, Montgomery, Schenectady 


-1.470 


-1.400 


-1.575 


457183 


2.672 


Erie 


-1.708 


-2.351 


-1.445 


894808 


2.757 


Cayuga, Cortland, Onondaga 


-2.123 


-1.927 


-2.109 


555079 


2.232 


Niagara, Orleans 


-2.601 


-2.826 


-2.846 


251228 


2.024 


Monroe 


-2.885 


-2.042 


-3.082 


698113 


1.536 


Genesee, Livingston, Ontario & 












Seneca, Wayne, Wyoming 


-2.974 


-2.953 


-2.852 


369250 


1.930 
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Table 10 

County group level results in California 



Relative difference in California (census as the base) 
Counties CB's Alter. 1 Alter. 2 Census Il/Census 



Imperial 


2, 


,959 


3, 


,589 


2, 


,783 


131317 


4.343 


Kings 


2, 


.384 


1, 


,891 


2, 


,827 


109332 


4.114 


ban Luis Ubispo, santa Barbara 


2, 


,191 


2, 


,257 


1, 


,911 


613840 


2.995 


Monterey, ban Benito, banta Cruz 


2, 


,086 


1, 


,607 


2, 


,325 


680087 


4.018 


Merced, Stanislaus 


2, 


,050 


1, 


,848 


2, 


,198 


647207 


3.538 


Del JNorte, Humboldt, Lake, Mendocino, JNapa 


1, 


,918 


1, 


,920 


1, 


,865 


406509 


3.242 


Kern, Tulare 


1, 


,913 


1, 


,233 


2, 


,213 


993655 


4.352 


Los Angeles 


1, 


,829 


1, 


,727 


1, 


,776 


9344086 


3.529 


Butte, Lassen, Modoc, Nevada, Flumas &: 


















Shasta, Sierra Siskiyou, Trinity, Yuba 


1, 


,801 


2, 


,345 


1, 


,353 


621777 


2.431 


1~1 TV T 1 TV T 

Iresno, Madera, Mariposa 


1, 


,661 


0, 


,448 


2, 


,228 


912453 


4.657 


Colusa, Glenn, butter, lenama, Yolo 


1, 


,577 


1, 


,935 


1, 


,325 


338148 


2.704 


San Francisco 


1, 


,572 


0, 


,623 


1, 


,900 


/od9 10 


4.283 


Inyo, San Bernardino 


1, 


,542 


1, 


,817 


1, 


,349 


1682190 


3.135 


Alameda 


1, 


,408 


1, 


,969 


1, 


,186 


1416006 


2.757 


San Joaquin 


1, 


,390 


0, 


,771 


1, 


,581 


544827 


3.827 


Riverside 


1, 


,380 


1, 


,404 


1, 


,395 


1511034 


3.179 


Santa Clara 


1, 


,282 


1, 


,223 


1, 


,361 


1652871 


3.081 


Orange 


1, 


,275 


0, 


,953 


1, 


,483 


2803924 


3.216 


San Diego 


1, 


,228 


1, 


,623 


0, 


,989 


2716820 


2.616 


San Mateo 


1, 


,192 


0, 


,841 


1, 


,311 


696711 


3.252 


Ventura 


1, 


,131 


1, 


,150 


1, 


,189 


739985 


2.729 


Sacramento 


1, 


,105 


1, 


,250 


0, 


,966 


1198004 


2.702 


Contra Costa, Solano 


1, 


,065 


1, 


,612 


0, 


,868 


1316047 


2.330 


Alpine, Amador, Calaveras, El Dorado & 


















Mono, Placer, Tuolumne 


0, 


,987 


0, 


,778 


1, 


,073 


534773 


3.136 


Marin, Sonoma 


0, 


,953 


1, 


,144 


0, 


,909 


683315 


2.365 



State adjusted difference in California 
Counties CB's Alter. 1 Alter. 2 DD II/DD 



Imperial 


4, 


,269 


4, 


,928 


4, 


,085 


125614 


4.540 


Kings 


3, 


,412 


2, 


,898 


3, 


,874 


104834 


4.291 


Fresno, Madera, Mariposa 


3, 


,262 


1, 


,990 


3, 


,857 


869960 


4.884 


Kern, Tulare 


3, 


,186 


2, 


,475 


3, 


,500 


950411 


4.550 


Monterey, San Benito, Santa Cruz 


2, 


,995 


2, 


,496 


3, 


,244 


652762 


4.186 


San Francisco 


2, 


,753 


1, 


,762 


3, 


,090 


724551 


4.475 


Merced, Stanislaus 


2, 


,429 


2, 


,219 


2, 


,582 


624309 


3.668 


Los Angeles 


2, 


,189 


2, 


,084 


2, 


,135 


9014370 


3.658 


San Joaquin 


2, 


,060 


1, 


,417 


2, 


,259 


523974 


3.980 


Sanluis Obispo, Santa Barbara 


1, 


,982 


2, 


,049 


1, 


,693 


595458 


3.087 


Del Norte, Humboldt, Lake, Mendocino, Napa 


1, 


,968 


1, 


,970 


1, 


,913 


393332 


3.350 


Inyo, San Bernardino 


1, 


,464 


1, 


,748 


1, 


,265 


1629458 


3.236 


Riverside 


1, 


,344 


1, 


,369 


1, 


,360 


1462999 


3.283 


Orange 


1, 


,276 


0, 


,944 


1, 


,491 


2713751 


3.323 


San Mateo 


1, 


,228 


0, 


,866 


1, 


,351 


674056 


3.361 


Santa Clara 


1, 


,137 


1, 


,076 


1, 


,219 


1601952 


3.179 


Colusa, Glenn, Sutter, Tehama, Yolo 


1, 


,036 


1, 


,404 


0, 


,777 


329003 


2.780 


Butte, Lassen, Modoc, Nevada, Plumas & 


















Shasta, Sierra, Siskiyou, Trinity, Yuba 


0, 


,972 


1, 


,530 


0, 


,513 


606664 


2.491 


Alameda 


0, 


,920 


1, 


,496 


0, 


,691 


1376961 


2.836 


Alpine, Amador, Calaveras, El Dorado & 


















Mono, Placer, Tuolumne 


0, 


,892 


0, 


,677 


0, 


,981 


518002 


3.238 


Ventura 


0, 


,604 


0, 


,623 


0, 


,664 


719791 


2.806 


San Diego 


0, 


,584 


0, 


,988 


0, 


,337 


2645741 


2.687 


Sacramento 


0, 


,548 


0, 


,698 


0, 


,407 


1165633 


2.777 


Contra Costa, Solano 


0, 


,112 


0, 


,672 


-0, 


,090 


1285381 


2.386 


Marin, Sonoma 


0, 


,034 


0, 


,230 


-0, 


,011 


667156 


2.422 
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